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Nowadays, various innovative educational and instructional tools have been 
created to deliver learning material including video content. One of the 
important issues with video-based learning is to devise effective teaching 
strategies to ensure higher level of learning can be achieved by the students. 
Getting insight and predicting the students’ video-based learning adoption 
will help the educators. Thus, this study aims to examine the potential of 
using machine learning prediction models on video-based learning adoption 
in higher education institutions. Five machine learning algorithms were used 
to be empirically compared namely generalized linear model (GLM), 
random forest (RF), decision tree (DT), gradient boosted tree (GBT), and 
support vector machine (SVM). The performance of each machine learning 
algorithm in predicting the students’ learning adoption with video-based 
learning has been observed based on the attributes of task-technology fit 
theory. The findings indicated that the task-technology fit is useful in 
helping the machine learning algorithm to achieve high accuracy in the 
prediction of video-based learning adoption. The GBT is the best 
outperforming algorithm, followed with RF and SVM. This paper presents a 
fundamental research framework useful for helping educators and 
researchers to enhance student interest and retention on video-based 
learning. 
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1. INTRODUCTION 


The sudden outbreak called COVID-19 has changed education system dramatically. The social 
confinement enforcement introduced and implemented by most of the countries, led to the suspension of all 
education activities including conventional face to face teaching. During the challenges period, online 
learning is regarded as an alternative solution in ensuring the learning process yet minimize the health risk 
both educators and students as teaching is undertaken remotely and on digital platforms [1], [2]. In response 
to this new learning environment, educators create various innovative educational and instructional tools to 
deliver learning material including video content. 

Video content, also known as video-based learning [3] has been used in online learning for a long 
time. The educators create video content as a teaching tool because it able to increase students’ knowledge 
and understanding [4] and improve their study habits [5]. Besides, unlike text-only content, video lectures 
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that consists of multimedia elements that can give additional information to the students which eventually 
lead for better learning and retention [6]. In addition, video content has been used because it allows for 
practical and experiential learning especially on complex concept such as to simulate a laboratory experiment 
[7] or giving some practical demonstrations [8]. 

In line with the increasing demand and benefits offered by the video-based learning, studies on 
students’ intention and adoption of video contents prior COVID-19 outbreak also increase. However, the 
main findings of the prior studies are mixed and inconclusive. For example, although researchers in [9], [10] 
found positive student perceptions on video-based learning, [11], [12] fail to find significant differences 
between the traditional and video teaching tools. In addition, [13] add that video-based learning might have 
adverse impact on students learning outcome and in-class performance as some of them tend to skip video 
lessons. 

This study extent prior research by examining student’s actual adoption of video-based learning in 
higher educational institutions using unique academic setting, forced online learning during COVID-19 
outbreak. Unlike prior studies [7]-[11] that employed traditional statistical method, this study develops video- 
based learning usage based on task-technology attributes with machine learning prediction technique [14]. 
Prior research highlights the effectiveness and accuracy of machine learning approaches on prediction and 
classification of financial and accounting studies including prediction of firm financial distress [15], tax 
aversion and avoidance [16], [17] and auditor choice [18]. However, to date there is very limited studies that 
employ machine learning prediction and classification on accounting education. 

There are two main contributions of this study. First, it extends prior works [7]-[11] in constructing 
video-based learning adoption model in order to deepen current understanding on the acceptance of video as 
one of the educational and instructional tools in remote learning environment especially during COVID-19 
pandemic. Second, it provides design and implementation of machine learning prediction in video-based 
learning by using three constructs of task-technology fit theory; task characteristics, technology 
characteristics and individual characteristics. 


2. RESEARCH METHOD 
2.1. Data collection and datasets 

Data of this study were collected using questionnaires survey that comprises of two sections. The 
first section is relevant to the respondents’ socio-demographic characteristics such as gender, academic 
performance, residential area, and monthly family income as well as video-based learning exposure before 
COVID-19 pandemic. Cumulative grade point average (CGPA) was used to measure the academic 
performance of a student by obtaining the mean of the grade point average that a student is awarded every 
semester and is divided by the total number of credits have been registered. Meanwhile, the second section of 
the questionnaire was developed based on the three main constructs of the proposed theory; task-technology 
fit, technology characteristics and individual characteristics. To measure each construct of this study, a five- 
point Likert scale was employed, ranging from 1=strongly disagree to 5=strongly agree. Estimate for each 
construct was obtained using the average values of its indicators. Following [19], the specific indicators used 
to measure each of the constructs were adapted from the works of [20]-[22]. Meanwhile, to assess the actual 
usage of video-based learning, three indicators adapted from prior studies in [20], [23] were used. 

The questionnaires were self- administered to the undergraduate accounting students from a public 
university in Malaysia during the second semester of 2021/2022. Due to the COVID-19 pandemic, the 
university still implement remote teaching for the whole semester and most of the subjects use live or 
prerecorded video in learning process. The administration of the questionnaires took place after explaining to 
the students the purpose of the study and after 30 minutes of video teaching material was delivered to the 
students. From the total of 280 questionnaires administered, 103 valid responses were used for the analysis, 
representing a response rate of 36.78%. 


2.2. Correlations of variables 

Table 1 lists the independent variables (IVs) in predicting the video-learning usages. Based on 
pearson correlation test, all the task-technology fit attributes except the individual characteristics, present 
positive strong correlations (above 0.8 correlation coefficient) to the video-learning usages. The correlation 
coefficient of the individual characteristics is 0.3 only, considerable as low as other IVs from the demography 
attributes. 

In each of the machine learning model with different algorithms, the contributions of each of the IVs 
listed in Table 1 will be observed and compared. Although only two IVs that strongly connected, the rest of 
low correlation IVs are expected to be beneficial in providing some knowledge to the prediction models. It is 
interesting to observe on how each of the IVs from the task-technology fit and demography attributes 
correlated and effects to the different machine learning models. 
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Table 1. Pearson correlation of each IV to the dependent variable (DV) 


Attribute Correlation coefficient 
Task-tech fit 0.831 
Tech. characteristics 0.825 
Ind. characteristics 0.327 
CGPA 0.306 
urban, residential area 0.154 
Prior exposure 0.071 
month household income 0.057 
gender 0.017 


2.3. Machine learning 

Five machine learning algorithms namely generalized linear model (GLM) [24], random forest (RF) 
[25] and decision tree (DT) [25], gradient boosted trees (GBT) [25], and support vector machine (SVM) [26] 
have been executed in RapidMiner platform with a 16 GB computer RAM. These five algorithms were 
selected based on the preliminary findings from auto model module the RapidMiner that uses optimization 
search strategy to identify the suitable algorithms for the given dataset. Table 2 lists the optimal parameters 
set of each machine learning algorithm from the preliminary machine learning hyper-parameters tuning. 


Table 2. Configuration of parameters 
Algorithm Optimal parameters Error rate (96) 


RF Number of trees=20 5.6 
Maximal depth=4 

DT Maximal depth=7 5.7 

GBT Number of trees=30 6.6 


Maximal depth=4 
Learning rate=0.1 

SVM Kernal gamma-0.005 4.5 
C-10 


The number of trees used in the preliminary hyper-parameters tuning of RF are 20, 60, 100,140. For 
each of the four number of trees, three values of maximal depth (2,4,7) have been observed. The worst error 
rate was 7.7% with the number of trees equals 140 and its maximal depth was 2. The best error rate is 5.6% 
with the configuration given in Table 2. 

For the DT, the range of of maximal depth used in the preliminary testing is between 2 to 25. The 
highest error rate was 8.9% if the maximal depth is 2, which can be reduced to 5.9% with maximal depth 
between 4 to 7. The error rate value remained consistent to 5.796 when the maximal depth was set to 7, 10, 15 
or 25. 

GBT has additional parameter namely learning rate besides number of trees and maximal depth. The 
minimum number of trees used in the preliminary algorithm tuning is 30 and the maximum is 150 with 2,4 
and 7 alternatives of maximal depth. The series of the learning rate was set between 0.001 to 0.1. The highest 
error rate achieved is 11.696 with 30 number of trees, 2 maximal depth and 0.001 learning rate. The lowest 
error (6.696) can be observed when the number of trees remain 30 but the maximal depth and the learning 
rate were set to 4 and 0.1 respectively. 

SVM uses kernal gamma and C (regularization) parameters, which were observed in the preliminary 
research between 0.005 to 5 for kernal gamma and 10-100 for C. The worse setting generated by SVM when 
the kernal gamma was 0.05 at 100 C, that reached to 55.7% of error rate. The best setting was 0.005 kernal 
gamma at 10 C to complete the prediction at 4.5% error rate only. For separating the training and testing 
datasets, the research split training approach with ratio of 60:40 percentages based on the configuration 
suggested by auto model RapidMiner. Therefore, from the 103 data, 62 of them were used for the machine 
learning training and 41 were used in the machine learning testing. 


3. RESULTS AND DISCUSSION 

There are two set of results of this research that need to be presented. Firstly, the results of 
performances of the machine learning in the video-learning usage prediction model is given in Table 3. 
Secondly, how the task-technology fit attributes and the students’ demography effecting the prediction model 
in the different algorithm are presented in the next sub-section. 
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Table 3. The performances result 
Algorithm RMSE (+-std.dev)  R^(4-std.dev) Time to Complete (ms) 


GLM 0.307(0.09) 0.746 (0.746) 81 
RF 0.293(0.06) 0.895 (0.08) 223 
DT 0.322(0.06) 0.765 (0.126) 50 

GBT 0.287(0.02) 0.911 (0.05) 5,000 

SVM 0.298(0.08) 0.78 (0.20) 1,000 


R square (R^) presenting the proportion of the variance in the prediction model that is explained by 
the attributes/IVs. The highest R squared was generated in the GBT (0.911). Besides, the lowest error 
presented by the root mean square error (RMSE) is 0.287 that was generated by GBT. The relative error, 
which is not listed in Table 3 for this machine learning algorithm is 5.3%, which is the lowest. Therefore, the 
most outperforming algorithm in term of prediction accuracy for video-based learning adoption is GBT. The 
time taken for GBT to complete the prediction processes from training and prediction is 5 seconds only. 
Furthermore, it is interesting to get insight on the contributions of each attribute from the task-technology fit 
and students’ demography. Figure 1 presents the weights of correlation from each of attributes used in the 
predictive model with GLM algorithm. 


WEIGHT 


N 


REESE 
6 o N D 
Ne e e 
dS & e Ko 
e & xe E 
d 5S 
e ES 
2 
NS 
A 
X 
Ss 


Figure 1. Weight of IVs in GLM 


The results in Figure 1 show that only five out of eight IVs contributed some values in GLM 
prediction model. Individual characteristics from the task-technology fit was not presenting contribution to 
the prediction model. The task-tech. fit and tech. characteristics, were having a very large correlation 
coefficient, 0.72 and 0.691 respectively. By using these five IVs, GLM can generate moderate level of 
accuracy (R^ above 70% and RMSE less 35%) for the prediction model. 

On the contrary, as seen in Figure 2, all the eight IVs have been used in RF but none of the IVs that 
reached above 0.6 of weight. The highest weight was provided by task-tech. fit (0.5653) and the lowest came 
from the demography attributes namely month household income (0.007). By using all the IVs, RF 
performed very well as the second-best algorithm with R^ 0.895 and RMSE 0.293. 

Similarly, all IVs have been used in DT as presented in the following Figure 3 and all of them have 
very low weights. For an example, only 0.348 correlation coefficient has been contributed by the task-tech. 
fit attribute while the rest of IVs were having weight values less than 0.3. Therefore, as listed in Table 3, DT 
accuracy is not as good as RF with its R^ 0.755 and RMSE 0.4222. 

Figure 4 shows the weight of all IVs in GBT, the most outperformed algorithm with R^ 0.911 and 
RMSE 0.287. Like RF and DT, task-tech. fit attribute has the highest contribution in the model followed with 
the tech. characteristics. From the demography attributes, CGPA seemed to be the most important. Based on 
Figures 1-4, it can be revealed in this study that even the individual characteristics contributed very low 
weight to the machine learning models, together with other IVs, the RF, DT, and GBT can achieved higher 
accuracy than GLM and SVM (refer Figure 5). As shown in Table 3, the performance of SVM was the worst, 
and the model only can be used five out of eight IVs as seen in Figure 5. 


Machine learning with task-technology fit theory factors for predicting students’ ... (Suraya Masrom) 


1670 O ISSN: 2302-9285 


0,6 
0,5 
E 0,4 0,287 
0,3 
o 7 w 
= 0,2 SS 0,134 0,13 
0,1 SS HHI 0,026 0,013 0,007 
0 NN Ex md === — 
AM NS 2 IK e E: 
s S4 S E aW 
x S LET 
x d d y 
A o3 o3 3 
ev ò 9 
e S < 
Ae N RYA 
$ & 
Figure 2. Weight of IVs in RF 
0,4 0,348 
0,35 
0,3 0,27 
0,25 N 
Lo 
5 0,2 SS 0,124 
z 0,15 N g 
0,1 
Pee N Il 0,04 00^ 0022 0,022 0,016 
E SS ee eee 
AM Re D e e e NS <& 
x & ó " SP e S e 
x xe S S $ e AS 
S $^ 048 Š À 
[o3 x? d [o3 OM 
XN L e > NN 
e Ns N S S 
“xe x? d 
FS 
Figure 3. Weight of IVs in DT 
0,7 
5 0,593 
lu 
[9] 
e 0,396 
u 
m NN 
9 SS 0,203 
Z N 
E SS II DE 0,054 0,022 0,022 0,013 
E N See 
a 
2 OP e K 
ó ; a Pd P S" r d 3s Ri e 
= S E 2 jg & so’ 
S Sl A x° 
e AP OM S2 
Sv 2) e <$ 
< y 
e s 
^ ES 
S "3 


Figure 4. Weight of IVs in GBT 
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Figure 5. Weight of IVs in SVM 


4. CONCLUSION 

This research has opened up many research opportunities related to video-based learning adoption 
prediction that considered task-technology fit theory consists of three attributes namely task-technology fit, 
technology characteristics and individual characteristics. Based on the tested dateset that focused on students 
from higher institution in Malaysia, the findings of this research showed that task-technology fit attributes 
have affected the machine learning prediction models mainly task-technology fit and technology 
characteristics. Academic performance from the demography attributes also has appeared as important factor 
in all the machine learning models except in the SVM. In general, the task-technology fit has given a more 
impact on the machine learning prediction models compared to the demography factors. The findings of this 
research hold much promising for helping educators and researchers to have better understanding on the 
important of task-technology fit before implementing video-based learning. With the help of machine 
learning, an efficient prediction for detecting at-risk students can be done early of semester teaching for 
appropriately intervening them to retain in the video-based learning. 
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